Sequence Space Localization in the Immune System Response to Vaccination and 

Disease 
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We introduce a model of protein evolution to explain limitations in the immune system response 
to vaccination and disease. The phenomenon of original antigenic sin, wherein vaccination creates 
memory sequences that can increase susceptibility to future exposures to the same disease, is ex- 
plained as stemming from localization of the immune system response in antibody sequence space. 
This localization is a result of the roughness in sequence space of the evolved antibody affinity 
constant for antigen and is observed for diseases with high year-to-year mutation rates, such as 
influenza. 

PACS numbers: 87.10.-l-e, 87.15.Aa, 87.17.-d, 87.23.Kg 
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INTRODUCTION 

Our immune system protects us against death by in- 
fection 0. A major component of the immune system 
is generation of antibodies, protein molecules that bind 
specific antigens. To recognize invading pathogens, the 
immune system performs a search of the amino acid se- 
quence space of possible antibodies. To find useful anti- 
bodies in the effectively infinite protein sequence space, 
the immune system has evolved a hierarchical strategy. 
The first step involves creating the DNA sequences for 
B cells that code for moderately effective antibodies 
through rearrangement of immune-system-specific gene 
fragments from the genome 0]. This process is called 
VDJ recombination. This combinatorial process can pro- 
duce on the order of 10^''^ different antibodies through 
recombination of pieces of antibodies. The second step, 
which occurs when a specific antigen invades our body, 
is somatic hypermutation. Somatic hypermutation is the 
process of mutation that occurs when the B cells that pro- 
duce the antibodies divide and multiply. Only those B 
cells that produce antibodies that bind the antigen with 
higher affinity are propagated by this mutation and selec- 
tion process, and another name for this process is affin- 
ity maturation. Somatic hypermutation is essentially a 
search of the amino acid sequence space at the level of 
individual point mutations 

The consequence of an immune system response to 
antigen is the establishment of a state of memory 
Immunological memory is the ability of the immune sys- 
tem to respond more rapidly and effectively to antigens 
that have been encountered previously. Specific memory 
is maintained in the DNA of long-lived memory B cells 
that can persist without residual antigen. 

Although our immune system is highly effective, some 
limitations have been reported. The phenomenon known 
as "original antigenic sin" is the tendency for antibodies 
produced in response to exposure to influenza virus anti- 
gens to suppress the creation of new, different antibodies 
in response to exposure to different versions of the flu ^ . 



Roughly speaking, the immune system responds only to 
the antigen fragments, or epitopes, that are in common 
with the original flu virus. As a result, individuals vac- 
cinated against the flu may become more susceptible to 
infection by mutated strains of the flu than would in- 
dividuals receiving no vaccination. The details of how 
original antigenic sin works, even at a qualitative level, 
are unknown. 

In this Letter, we offer an explanation for the reported 
limitations in the immune system response using a model 
of protein evolution. We describe the dynamics of affin- 
ity maturation by a search for increased binding con- 
stants between antibody and antigen in antibody se- 
quence space. It is shown that an immune system re- 
sponse to an antigen generates localized memory B cell 
sequences. This set of localized sequences reduces the 
ability of the immune system to respond to subsequent 
exposures to different but related antigens. It is this com- 
petitive process between memory sequences and the VDJ 
recombinations of secondary exposure that is responsible 
for the reported limitations in the immune system. 

We use a random energy model to represent the inter- 
action between the antibodies and the influenza proteins. 
This model captures the essence of the correlated rugged- 
ness of the interaction energy in the variable space, the 
variables being the antibody amino acid sequences and 
the identity of the disease proteins, and the correlations 
being mainly due to the physical structure of the anti- 
bodies. The random energy model allows study of the 
sequence-level dynamics of the immune/antigen system, 
which would otherwise be an intractable problem at the 
atomic scale, with 10^ atoms per antibody, 10® antibod- 
ies per individual, 6 x 10^ individuals, and many possi- 
ble influenza strains. Use of random energy theory to 
treat correlations in otherwise intractable physical sys- 
tems goes back at least to Bohr's random matrix theory 
for nuclear cross sections and has been used for quan- 
tum chaos, disordered mesoscopic systems, QCD, and 
quantum gravity Close to the present application is 
the study of spin glasses by random energy models ^ 0] , 
protein folding by coarse-grained models 0, 0|, and 
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evolutionary systems by NK-type models 0, 0,0,0] ■ 
In detail, the generalized NK model we use considers 
three different kinds of interactions within an antibody: 
interaction within a subdomain (U^'^), interactions be- 
tween subdomains {U^'^~^'^), and direct binding interac- 
tion between antibody and antigen {If^). In the context 
of protein evolutio n, p arameters of the model have been 
calibrated 0, flfil Il7j . The energy function of a protein 
is given by 

M M P 

where M is the number of antibody secondary structural 
subdomains, and P is the number of antibody amino 
acids contributing directly to the binding. The subdo- 
main energy U^'^ is 

^ N-K+l 

(2) 

where N is the number of amino acids in a subdomain, 
and K is the range of local interaction within a subdo- 
main. All subdomains belong to one of L = 5 different 
types {e.g.^ helices, strands, loops, turns, and others). 
The quenched Gaussian random number CTq. is different 
for each value of its argument for a given subdomain 
type, ai. All of the Gaussian a values have zero mean 
and unit variance. The energy of interaction between 
secondary structures is 
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We set the number of interactions between secondary 
structures at D = 6. Here tr^- and the interacting 
amino acids, ji, • • •, Jk, are selected at random for each 
interaction {k,i,j). The chemical binding energy of 
each antibody amino acid to the antigen is given by 
U^ = ai{ai)/^/P. The contributing amino acid, i, and 
the unit-normal weight of the binding, ai, are chosen 
at random. Using experimental results, we take P = 5 
amino acids to contribute directly to the binding event. 
Here we consider only five chemically distinct amino acid 
classes {e.g., negative, positive, polar, hydrophobic, and 
other) since each different type of amino acid behaves as 
a completely different chemical entity within the random 
energy model. 

The generalized NK model, while a simplified descrip- 
tion of real proteins, captures much of the thermodynam- 
ics of protein folding and ligand binding. In the model, 
a specific B cell repertoire is represented by a specific set 
of amino acid sequences. Moreover, a specific instance 



of the random parameters within the model represents a 
specific antigen. An immune response that finds a B cell 
that produces an antibody with high affinity constant to 
a specific antigen corresponds in the model to finding a 
sequence having a low energy for a specific parameter set. 

The random character of the generalized NK model 
makes the energy rugged in antibody sequence space. 
The energy is, moreover, correlated by the local anti- 
body structure {K = 4), the secondary antibody struc- 
ture {U^'^~^'^), and the interaction with the influenza pro- 
teins (W^). As the immune system explores the space of 
possible antibodies, localization is possible if the corre- 
lated ruggedness of the interaction energy is sufficiently 
great. 

Since the variable region in each light and heavy chain 
of an antibody is about 100 amino acids long, and most 
of the binding occurs in the heavy chain, we choose a 
sequence length of 100. We choose M = 10 since there 
are roughly 10 secondary structures in a typical antibody 
and thus choose N = 10. The immune system contains of 
the order of 10* B cells divided into different specificities, 
and the frequency of a specific B cell participating in the 
initial immune response is roughly 1 in 10^ [l^ . Hence, 
we use 10'' sequences during an immune response. 

The hierarchical strategy of the immune system is used 
to search the antibody sequence space for high affinity an- 
tibodies. Initial combination of optimized subdomains is 
followed by a point mutation and selection procedure [l^ . 
To mimic combinatorial joining of gene segments during 
B cell development, we produce a naive B cell repertoire 
by choosing each subdomain sequence from pools that 
have iVpooi amino acid segments obtained by minimizing 
the appropriate U^'^. To fit the theoretical heavy-chain 
diversity of 3 x 10^^ 0, we choose iVpooi = 3 sequences 
among the top 300 sequences for each subdomain type. 

Somatic hypermutation occurs at the rate of roughly 
one mutation per variable regions of light and heavy 
chains per cell division, which occurs every 6 to 8 hours 
during intense cell proliferation |l9l |. Hence, in our sim- 
ulation, we do 0.5 point mutations per sequence, keep 
the best (highest affinity) x = 20% sequences, and then 
amplify these back up to a total of 10'^ copies in one 
round, which corresponds to 1/3 day. That is, the prob- 
ability of picking one of the, possibly mutated, sequences 
for the next round is Pseioct = 1/200 for U < U200 and 
Psoicct = for J7 > C/200 , where U200 is the 200th best en- 
ergy of the 10'^ sequences after the mutation events, and 
this equation is employed 10^ times to select randomly 
the 10'^ sequences for the next round. Given a specific 
antigen, i.e. a specific set of interaction parameters, we 
do 30 rounds (10 days) of point mutation and selection 
in one immune response. In this way, memory B cells for 
the antigen are generated. 

The affinity constant is given as a function of energy, 

K'^'i = exp(a - bU) , (4) 
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where a and b are determined by the dynamics of the 
mutation and selection process. Affinity constants re- 
sulting from VDJ recombination are roughly 10'*, affinity 
constants after the first response of affinity maturation 
are roughly 10^, and affinity constants after a second re- 
sponse of affinity maturation are roughly 10^ 18] (values 
that fix the selection strength, x). By comparison to the 
dynamics of the model, we obtain a — —18.56, b — 1.67. 

The memory B cells, key to immunological memory, 
give rapid and effective response to the same antigen 
due to their increased affinity for previously-encountered 
antigens. We focus on the role of the memory cells in 
the immune response to different antigens. The distance 
between a first antigen and the second antigen is given 
by the probability, p, of changing parameters of inter- 
action within subdomain {U^'^), subdomain-subdomain 
(f/sd-sd)^ and chemical binding (U'^) terms. Within U""^, 
we change only the subdomain type, a^, not the param- 
eters CTq,, which are probably fixed by structural biology 
and should be independent of the antigen. 

We estimate the number of memory and naive B cells 
that participate in the immune response to the second 
antigen by the ratio of the respective affinity constants. 
From the definition of the affinity constant, K^"^ = 
[Antigen : Antibody] / {[Antigen] [Antibody]}, the bind- 
ing probability is proportional to the affinity constant 
and to the concentration of antigen-specific antibody, 
which is 10^ times greater for the memory sequences [l8j. 
We measure the average affinity for the second antigen 
of the 10^ memory cells, K'^, and that of the 10^ B 
cells from the naive repertoire of optimized subdomain 
sequences, K^. The ratio IQ^K'^/K'^ gives the ratio of 
memory cells to naive cells. For exposure to the second 
antigen, we perform 30 rounds (10 days) of point muta- 
tion and selection, starting with IQr' K^/ + Kl'i) 
memory cells and WK^'i/{WK^ + X^^) naive cells, 
since both memory and naive sequences participate in 
the secondary response [20|. 

Figure ^ shows the evolved affinity constant to a sec- 
ond antigen if the exposure to a first antigen exists (solid 
line) or not (dashed line) as a function of the difference 
between the first and second antigen, p, or "antigenic dis- 
tance" [i^. When the difference is small, the exposure 
to a first antigen leads to higher affinity constant than 
without exposure, which is why immune system mem- 
ory and vaccination is effective. For a large difference, 
the antigen encountered in the first exposure is uncorre- 
lated with that in the second exposure, and so immune 
system memory does not play a role. Interestingly, the 
immunological memory from the first exposure actually 
gives worse protection, i.e. a lower affinity constant, for 
intermediate differences — which is original antigenic sin. 

The dynamics of the immune response (Fig. ^ depend 
upon the constants of Nature, i.e. the parameters of the 
model. For example, in an organism with a smaller im- 
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FIG. 1: The evolved affinity constant to a second antigen 
after exposure to an original antigen that differs by proba- 
bility p (solid line). The dotted line represents the affinity 
constant without previous exposure. The affinity constant is 
generated by exponentiating, as in Eq. Q i the average of the 
best binding energy, using 5000 instances of the model. In 
inset is shown the affinity of the memory sequences for the 
mutated antigen. 

mune system, such as the mouse, there are fewer start- 
ing sequences and less favorable binding constants are 
measured in the same number of rounds: a factor of 0.5 
reduction in the number of starting sequences leads to a 
0.64 reduction in the evolved binding constant, but a sim- 
ilar degree of original antigenic sin as in Fig. ^ On the 
other hand, if more rounds are performed, better bind- 
ing constants are found in the primary and secondary 
responses, but the secondary response is not as improved 
over the primary response as when using 30 rounds, be- 
cause the evolved sequences are becoming more localized 
in ever deeper wells. The degree of original antigenic sin 
is, however, similar in the range of 30 to 60 rounds per 
response. Similarly, if the roughness of the energy upon 
disease mutation is increased, for example by assuming 
that mutation of the influenza actually changes the tXa, 
the degree of original antigenic sin increases substantially, 
by a factor of 2, because the barriers between the regions 
of localization in sequence space are increased. On the 
other hand, if the concentration of the memory cells is 
decreased, the contribution of the memory cells to the 
dynamics is reduced, and the original antigenic sin phe- 
nomenon decreases, almost disappearing when the mem- 
ory and naive antibody concentrations are equal. The 
average number of mutations leading to the best anti- 
body, a measure of the localization length when origi- 
nal antigenic sin occurs, is 15 for the first response and 
rises from 5 to 15 for the second response in the range 
< P < 0.30. Interestingly, the average number of muta- 
tions rises slightly above 15 in the range 0.30 < p < 0.70, 
indicating that in the original antigenic sin region more 
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mutations are necessary for the compressed ensemble of 
memory sequences from the primary response to evolve to 
a suitable new state in the secondary response. Larger 
selection strengths {x < 20%) cause more localization 
and original antigenic sin, in shallower wells for small x, 
and smaller strengths {x > 20%) lead to less evolution. 

For small values of p, p < 0.19, the memory B cells 
produce antibodies with higher affinities, > 10^, 

for the new antigen than do naive B cells. The bind- 
ing constant of the memory antibodies steadily decreases 
with p, reaching the non-specific value of = 10^ at 
p = 0.36 (see inset in Fig.^l, which, interestingly, is less 
than the range to which original antigenic sin extends, 
0.23 < p < 0.60. These model predictions are in good 
agreement with experimental data on cross-reactivity, 
which ceases to occur when the amino acid sequences 
are more than 33-42% different [2^ . 

The ineffectiveness of immune system memory over a 
window of p values can be understood from the local- 
ization of memory B cell sequences. Figure |2] displays 
distributions of memory and naive affinity constants for 
the second antigen. Notice that the memory sequences 
are highly homogeneous and lack diversity compared to 
the naive sequences. Indeed, original antigenic sin arises 
mainly because the memory sequences from the primary 
response suppress use of naive sequences in the tail of the 
distribution for the secondary response. Although those 
naive sequences initially look unpromising, they may ac- 
tually evolve to sequences with superior binding con- 
stants. The inset to Fig. |21 illustrates this phenomenon. 
Interestingly, when half of the distribution is removed, 
the reduction in the binding constant is just about that 
which occurs in original antigenic sin. Fig. ^ 

In summary, the generalized NK model is shown to suc- 
cessfully model immune system dynamics. A localization 
mechanism for the original antigenic sin phenomenon ob- 
served in the flu is explained. Localization of antibod- 
ies in the amino acid sequence space around memory B 
cell sequences is shown to lead to a decreased ability of 
the immune system to respond to diseases with year-to- 
year mutation rates within a critical window. This local- 
ization occurs because of the ruggedness of the evolved 
affinity constant in amino acid sequence space. From 
the model dynamics, we find that memory sequences can 
both outcompete the non-vaccinated immune response 
and become trapped in local minima. Memory sequences 
with affinity constants initially superior to those from 
naive sequences can be selected by the dynamics, and 
these memory sequences can lead to poorer evolved affin- 
ity constants, to the detriment of the immune system for 
intermediate disease mutation rates. These results sug- 
gest several implications for vaccination strategy: the dif- 
ference between vaccinations administered on a repeated 
basis should be as great as practicable, and suppression 
of the memory B cell response may be helpful during 
vaccination against highly variable antigens. 
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FIG. 2: The affinity distribution of antibodies from naive 
B cells (hatched) or memory cells (open) for the antigen of 
second exposure. In the inset is shown the binding constant 
that evolves in the primary response when an initial selection 
step retains only the top fraction of the naive cells. 



M.W.D. thanks Jeong-Man Park for stimulating dis- 
cussion. This research was supported by the National 
Science and Camille & Henry Dreyfus Foundations. 

Corresponding e-mail: mwdeem@rice.edu. 



[1] A. S. Perelson and G. Weisbuch, Rev. Mod. Phys. 69, 
1219 (1997). 

S. Tonegawa, Nature 302, 575 (1983). 
G. M. Griffiths, C. Berek, M. Kaartinen, and G. Milstein, 
Nature 312, 271 (1984). 
D. Gray, Annu. Rev. Immu. 11, 49 (1993). 
S. Fazekas de St. Groth and R. G. Webster, J. Exp. Med. 
124, 331 (1966). 

N. Bohr, Nature 137, 344 (1936). 

T. Guhr, A. Miiller-Groeling, and H. A. Weidenmiiller, 
Phys. Rep. 299, 189 (1998). 

D. Sherrington and S. Kirkpatrick, Phys. Rev. Lett. 35, 
1792 (1975). 

B. Derrida, Phys. Rev. Lett. 45, 79 (1980). 
J. D. Bryngelson and P. G. Wolynes, Proc. Natl. Acad. 
Sci. USA 84, 7524 (1987). 

A. M. Gutin and E. I. Shakhnovich, J. Ghem. Phys. 98, 
8174 (1993). 

L. D. Bogarad and M. W. Deem, Proc. Natl. Acad. Sci. 
USA 96, 2591 (1999). 

B. Derrida and L. Peliti, Bull. Math. Biol. 53, 355 (1991). 
G. Weisbuch, J. Theor. Biol. 143, 507 (1990). 
B. Drossel, Adv. Phys. 50, 209 (2001). 
S. Kauffman and S. Levin, J. Theor. Biol. 128, 11 (1987). 
A. S. Perelson and G. A. Macken, Proc. Natl. Acad. Sci. 
USA 92, 9657 (1995). 

[18] G. A. Janeway, P. Travers, M. Walport, and M. Shlom- 
chik, Immunobiology (Garland Publishing, New York, 
2001), 5th ed. 



[2: 

[4: 

[5 

[6: 
[7; 

[8 

[9: 

[10 

[11 

[12 

[13 
[14 
[15 
[16 
[17 



5 



[19] D. L. French, R. Laskov, and M. D. ScharfF, Science 244, 

1152 (1989). 

[20] C. Bcrck and C. Milstcin, Immuno. Rev. 96, 23 (1987). 
[21] D. J. Smith, S. Forrest, D. H. Ackley, and A. S. Perelson, 



Proc. Natl. Acad. Sci. USA 96, 14001 (1999). 
[22] J. J. East, P. E. E. Todd, and S. J. Leach, Mol. Immunol. 
17, 1545 (1980). 



